natural_language_understandingfandomcom-20200214-history
Entity coreference resolution
Subproblems Broad-referring expressions "It" is a particularly difficult case for coreference resolution. It might refer to singular inanimate objects, some animals, abstractions/events, or non-specific things (pleonastic uses). "They" is slightly easier but more difficult than other pronouns. "This" and "that" can also refer to abstraction which is rather broad (McShane and Babkin, 2015)Mcshane, M., & Babkin, P. (2015). Resolving Difficult Referring Expressions, 1–21.. However, in most cases I found in OntoNotes, they are followed by a noun such as "this area", "this facility", etc. Types of coreferencing expressions TODO: referential hierarchies of Ariel (1988)M. Ariel. 1988. Referring and accessibility. Journal of Linguistics, pages 65–87. or Gundel et al. (1993)J. K. Gundel, N. Hedberg, and R. Zacharski. 1993. Cog- nitive status and the form of referring expressions in discourse. Language, 69:274–307. A. Approaches Classified based on inferencing method * Rule-based * Machine learning * Inference-based: Inoue et al. (2012)Inoue, N., Ovchinnikova, E., Inui, K., & Hobbs, J. (2012). Coreference Resolution with ILP-based Weighted Abduction. In COLING (pp. 1291-1308). Classified based on type of evidence Discourse-based Discourse-based method takes into account aspects of discourse such as coherence and centering. From Laplinn and Leass (1994)Lappin, S., & Leass, H. J. (1994). An Algorithm for Pronominal Anaphora Resolution. Computational Linguistics, 20(4), 535–561. Retrieved from http://dl.acm.org/citation.cfm?id=203989: "Discourse Based Methods Most of the work in this area seeks to formulate general principles of discourse struc- ture and interpretation and to integrate methods of anaphora resolution into a computational model of discourse interpretation (and sometimes of generation as well). Sidner (1981, 1983), Grosz, Joshi, and Weinstein (1983, 1986), Grosz and Sidner (1986), Brennan, Friedman, and Pollard (1987), and Webber (1988) present different versions of this approach. Dynamic properties of discourse, especially coherence and focusing, are invoked as the primary basis for identifying antecedence candidates; selecting a candidate as the antecedent of a pronoun in discourse involves additional constraints of a syntactic, semantic, and pragmatic nature." Potential problems: * From Laplinn and Leass (1994): "... assign too dominant a role to coherence and focus in antecedent selection. As a result, they establish a strong preference for intersentential over intrasentential anaphora resolution. This is the case with the anaphora resolution algorithm described by Brennan, Friedman, and Pollard (1987)." * Alshawi (1987, p. 62; as cited in Laplinn and Leass, 1994)) : an algorithm/model relying on the relative salience of all entities evoked by a text, with a mechanism for removing or filtering entities whose salience falls below a threshold, is preferable to models that "make assumptions about a single (if shifting) focus of attention." Mixed models Combining syntactic, semantic, and discourse factors, etc. Examples: Laplinn and Leass (1994), Asher and Wada (1988), Carbonell and Brown (1988), and Rich and LuperFoy (1988) Classified based on construction of coreference chain : See also: Ng (2010)''Ng, V. (2010). Supervised Noun Phrase Coreference Research: The First Fifteen Years. ''ACL ’10, (July), 1396–1411. http://doi.org/10.1109/TVCG.2007.24'', Heng Ji's slide'' To construct a coreference chain, one can consider each elements separately or matching one element candidate to a partial chain. There are 04 major approaches to this problem: * Mention-pair model: whether two mentions are coreferential or not ** (Soon et al. 2001; Ng and Cardie 2002; Ji et al., 2005; McCallum & Wellner, 2004; Nicolae & Nicolae, 2006) * Entity-mention model: whether a mention and a preceding (partial) cluster are coreferential or not ** Ref: (Pasula et al. 2003 ; Luo et al. 2004; Yang et al. 2004, 2008; Daume & Marcu, 2005; Culotta et al., 2007; Lee et al., 2013Lee, H., Chang, A., Peirsman, Y., Chambers, N., Surdeanu, M., & Jurafsky, D. (2013). Deterministic Coreference Resolution Based on Entity-Centric, Precision-Ranked Rules. Computational Linguistics, 39(4), 885–916. doi:10.1162/COLI) * Mention-ranking model: which of the preceding mentions is coreferential to a given mention ** Ref: Denis & Baldridge 2007, 2008 ** Special case: rank two candidate NPs, called tournament model by Iida et al. (2003)Ryu Iida, Kentaro Inui, Hiroya Takamura, and Yuji Matsumoto. 2003. Incorporating contextual cues in trainable models for coreference resolution. In Proceedings of the EACLWorkshop on The Compu- tational Treatment of Anaphora. and the twin-candidate model by Yang et al. (2003Xiaofeng Yang, Guodong Zhou, Jian Su, and Chew Lim Tan. 2003. Coreference resolution us- ing competitive learning approach. In Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics, pages 176–183.; 2008bXiaofengYang, Jian Su, and Chew Lim Tan. 2008b. A twin-candidate model for learning-based anaphora resolution. Computational Linguistics, 34(3):327– 356.) * Cluster-ranking model: which of the preceding clusters is coreferential to a given mention ** Ref: Rahman and Ng (2009)AltafRahman andVincentNg. 2009. Supervisedmod- els for coreference resolution. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, pages 968–977. From Ng (2010): "An important issue with ranking models that we have eluded so far concerns the identification of non-anaphoric NPs. As a ranker simply imposes a ranking on candidate antecedents or pre- ceding clusters, it cannot determine whether anNP is anaphoric (and hence should be resolved). To address this problem, Denis and Baldridge (2008) apply an independently trained anaphoricity classifier to identify non-anaphoric NPs prior to ranking, and Rahman and Ng (2009) propose a model that jointly learns coreference and anaphoricity" Features Mention-level features Selectional preference "given a pronoun to be resolved, its governing verb, and its grammatical role, we prefer a candidate antecedent that can be governed by the same verb and be in the same role." Ref: (Dagan and Itai, 1990; Kehler et al., 2004b; Yang et al., 2005; Haghighi and Klein, 2009) Cluster (entity)-level features Open problems From McShane and Babkin (2016)Marjorie McShane, Petr Babkin. 2016. Resolving Difficult Referring Expressions PDF: "Among the more difficult referring expressions are so-called broad referring expressions, such as pronominal this and that ... In addition to untreated referring expressions, there are referring expressions that have been widely treated but have resisted high-precision results. One example is third person personal pronouns. The reason for the low precision is that resolution often requires specific world knowledge and reasoning, as illustrated by Winograd Schema examples like The mani could not lift his sonk because was so weak / hek was so heavy (Levesque et al., 2012)." TODO http://aclweb.org/anthology/N/N15/N15-1082.pdf, Winograd schema. "advanced model for CR" See also * Coreference (psycholinguistics) References Category:Coreference resolution